Automatically Categorizing Written Texts by Author Gender
نویسندگان
چکیده
The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80% accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98% accuracy.
منابع مشابه
Gender Identification in Russian Texts
Gender Identification is a task where we have to identify the gender of the author for written texts. An hybrid approach has been designed by combining deep neural network and a rule-based classifier for russian texts. LSTM and BiLSTM have been used as a part of Neural Network due to their capability to learn long-term dependencies.
متن کاملAutomatic Detection of Gender and Number Agreement Errors in Spanish Texts Written by Japanese Learners
This paper describes the creation of a grammar to automatically detect agreement errors (gender and number) in Spanish texts written by Japanese learners. The grammar has been written using the Constraint Grammar formalism (Karlsson et al., 1995), and uses as input the morphosyntactic analysis provided by the Spanish parser HISPAL (Bick, 2006). For developing and testing the grammar, a learner ...
متن کاملCategorizing spelling errors to assess L2 writing
Based on a corpus of 223 argumentative essays written by English as a foreign language learners, this study shows that spelling errors, whether detected manually or automatically, are a reliable predictor of the quality of L2 texts and that reliability is further improved by subcategorizing errors. However the benefit derived from subcategorization is much lower in the case of errors automatica...
متن کاملUsing LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse
Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the tax...
متن کاملA Comparative Study of Metadiscourse in Academic Writing: Male vs. Female Authors of Research Articles in Applied Linguistics
Like conversation and other modes of communication, writing is a rich medium for gender performance. In fact, writing functions to construct the disciplines as well as the gender of its practitioners. Despite the significance of author gender, as one constitutive dimension of any writing, it has been relatively under-researched. One way, by means of which author gender is practiced, and reveale...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- LLC
دوره 17 شماره
صفحات -
تاریخ انتشار 2002